Method and apparatus for retrieving accumulating and sorting table formatted data

ABSTRACT

This invention provides a method and apparatus for searching for and tabulating table-format data that not only has the functions of a conventional data table but also greatly increases the speed of searching for and tabulating large amounts of data. The method for searching for and tabulating table-format data represented as an array of records including fields containing field values for each field according to the present invention comprises: keeping in a storage device, a value control table containing field values in the order of field value numbers corresponding to field values belonging to a specific field, and a field value number-specifying information array containing information that specifies the field value numbers in the order of records, acquiring from the field value number-specifying information array the field value number corresponding to the specific record, and obtaining from the field values stored in the value control table the field value corresponding to the field value number thus acquired.

This application is related to and claims the early filing dates ofJapanese Patent Application No. 10-227278, filed Aug. 11, 1998, JapanesePatent Application No. 10-338133, filed Nov. 27, 1998, and InternationalPatent Application No. PCT/JP99/04300, filed Aug. 8, 1999. The entiredisclosures of the above applications are hereby incorporated byreference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The invention relates to a data processing method and data processingapparatus or processing large amounts of data using a computer or otherinformation processing apparatus, and particularly to a method andapparatus for searching for, tabulating and sorting table-format data.

2. Description of the Prior Art

Conventionally, large amounts of data are accumulated and searching andtabulating and other types of data processing is performed on theaccumulated data. This data processing may be done using, for example, aknown computer system including a CPU, memory, peripheral interface, ahard disk or other auxiliary storage device, a display, a printer orother output device, a keyboard, a mouse or other input device, and apower supply unit connected via a bus, and particularly as software thatcan be run on a readily available commercial computer system. In orderto perform the aforementioned searching, tabulating or other types ofdata processing, various types of databases that particularly storelarge amounts of data are known. Among various types of large amounts ofdata, there is a particularly strong demand to process data that can beexpressed in a table format. FIG. 1 is a diagram showing an example ofexpressing the data to be processed in a table format. FIG. 1 shows anexample wherein the sex, age and occupation data for a large number ofpeople, e.g. 1 million, are stored in a table. In FIG. 1, the horizontalrows in the table, namely the so-called records, consist of the recordnumber, and the sex, age and occupation fields corresponding to therecord number. The vertical columns in the table consist of the recordnumber, sex, field, age field and occupation field. The table indicatesthat the person with the record number of “0” has a sex of female, ageof 18 and occupation of programmer. In the following explanation, thedata such as “Female,” “18” and “Programmer” set in the various fieldsare called field values. In addition, in the following explanation,unless otherwise indicated, the table-format data consisting of 1million records shown in FIG. 1 is used as a specific example of a largeamount of data.

Whether or not large amounts of data can be searched for or tabulatedefficiently depends on the format in which the large amount of data isstored. Conventionally, typical known storage techniques include theso-called “record-sequential” and “field-sequential” storage techniquesshown in FIGS. 2A and 2B, respectively.

FIG. 2A and FIG. 2B show a representation of the data storage format ona storage device, e.g. a hard disk. In the case of the record-sequentialstorage technique in FIG. 2A, a set of the field values of sex, age andoccupation for each record number is stored on disk in the order ofincreasing logical addresses sequentially for each record number. On theother hand, in the case of the field-sequential storage technique inFIG. 2B, for each field, the field values are stored in record numberorder grouped by field in the direction of increasing logical addresses.To wit, in the example of FIG. 2B, the field values for the sex fieldcorresponding to record numbers “0” through “999999” are arranged inorder, and next, the field values for the age field are arranged inrecord number order, and then the field values for the occupation fieldare arranged in record number order.

In the case of the aforementioned prior art, field values correspondingto all fields for all record numbers are stored as is in atwo-dimensional data structure (with the record number as one dimensionand the other field values as one dimension). Hereinafter, such a datastructure in particular shall be referred to as a “data table.” In thecase of the prior art, when searching for and tabulating stored data,this is performed by accessing such a data table.

In addition to the method of storing the value of the fields as fieldvalues as is, there is also a known method of converting the values tocodes and storing the codes as field values. For example, with respectto the sex field, the value “Male” may be converted to “0” while thevalue “Female” is converted to “1” and then the values “0” or “1” arestored as the field values instead of “Male” or “Female.” Even in thiscase, there is no change to the point that the converted codes arestored in a data table as field values.

In the case of searching for and tabulating large amounts of data storedusing a data structure of the data table type in the aforementionedprior art, there is a problem in that the processing time for searchingand tabulating becomes longer due to the access time required to accesssuch data tables.

In addition, data tables have at least the following intrinsicdrawbacks.

(1) The data tables easily become enormous in size and cannot be easilyseparated (physically) into individual fields. For example, whenextracting records in which the sex is “Male,” the age and occupationinformation is unnecessary, so efficiency could be improved if the tablecould be separated into a table containing only the sex fields. In thecase of the field-sequential storage technique shown in FIG. 2B, whileseparation into individual fields is simple, when large amounts of dataare handled, the size of the data table still becomes enormous, so theactual expansion of a data table into memory or other fast storagedevice for the purpose of tabulating or searching is difficult.

(2) Data tables cannot be kept in a form with multiple field valuessorted simultaneously. For example, in the case of the prior artillustrated in FIG. 2A and FIG. 2B, the field values for the sex fieldarc arranged in record number order in the manner “Female, Male, Female,. . . , Female.” However, when performing searching and tabulatingprocesses, it is typically convenient for them to arrange in the manner“Female, Female, Female, Male, . . . , Male.” However, in table data,the field values are arranged in a specific matrix order, namely recordnumber order, so sorting the field values on a specific field is notpermitted. For this reason, in the case of the prior art, it is notpossible to select an arrangement of field values that is convenient forsearching and tabulating.

(3) In a data table, identical values appear over and over. For example,in the case of the conventional data table given in FIG. 2A and FIG. 2B,at the time of extracting records wherein the sex is “‘Male’ or ‘Man’”(or namely, record numbers), because the field value “Male” appears manytimes, it is necessary to perform the matching operation “‘Male’ or‘Man’” which is the comparison condition with the field value of “Male”many times. A single comparison should be sufficient to make thedetermination of whether there is a match with identical values.

In order to increase greatly the speed of searching for and tabulatinglarge amounts of data, the object of the present invention is to providea method of searching for, tabulating and sorting table-format data andan apparatus for implementing said method by providing a data controlmechanism that both has the functions of the conventional data table andsolves the aforementioned problems with the data structure based on thedata table.

SUMMARY OF THE INVENTION

In order to achieve the aforementioned object, the method and apparatusfor searching for and tabulating table-format data according to thepresent invention proposes a novel data control mechanism that is usableon an ordinary computer system. The data control mechanism according tothe present invention comprises a value control table and an array ofpointers to the value control table, as a general rule.

FIG. 3 is a diagram used to explain the principle of the presentinvention, showing a value control table 10 and an array of pointers tothe value control table 20. A value control table 10 is defined to be atable made by assigning, for each field in table-format data, an(integral) field value number to each field value belonging to thatfield, and the table thus contains the field values corresponding tosaid field value number arranged in order of the field value numbers(reference number 11) along with a category number (reference number 12)which relates to said field value. An array of pointers to the valuecontrol table 20 is defined to be an array containing pointers to thefield value numbers of the columns (namely, the fields) in thetable-format data, namely to the value control table 10, arranged inorder of the record numbers of the table-format data.

By combining the array of pointers to the value control table 20 withthe value control table 10, given a certain record number, it ispossible to use the array of pointers to the value control table 20pertaining to the field in question to extract the stored field valuenumber corresponding to that record number, and next, extract the storedfield value corresponding to that field value number within the valuecontrol table 10, and thus obtain the field value from the recordnumber. Therefore, in the same manner as with a conventional data table,it is possible to access all data (field values) with coordinatesconsisting of the record number (row) and field (column).

The data control mechanism according to the present invention, whichincludes a value control table generated for a certain field within thefields of table-format data and an array of pointers to the valuecontrol table, may also be referred to in particular as an “informationblock” in the following explanation.

While conventional data tables offer the integrated control of all datausing the coordinates of the rows corresponding to records and columnscorresponding to fields, the information blocks according to the presentinvention are characterized in that the data are completely separated bycolumn in the table format, namely by field. In this manner, by means ofthe present invention, large amounts of data are separated by field, soit is possible to load only that data related to those fields requiredfor searching or tabulating into memory or other high-speed storagedevice, and as a result, the access time to the data is reduced, so thesearching and tabulating processes are speeded up, and even extremelylarge amounts of data can be handled without adversely affectingperformance.

In addition, in the case of the information blocks according to thepresent invention, the field values are stored in the value controltable, and the record numbers that indicate the position of the valueare associated with the array of pointers to the value control table, sothere is no need for the field values to be arranged in record numberorder. Therefore, data can be sorted on field values such that it issuited to searching and tabulating. Thereby, the determination ofwhether or not a field value matching the target value is present in thedata can be performed at high speed. Furthermore, corresponding fieldvalue numbers are assigned to the field values, so even if the fieldvalues consist of long data or text strings, they can be handled asintegers.

Moreover, by means of the present invention, all of the field valuenumbers of the value control table 10 correspond to different fieldvalues, so the number of comparison operations between a specific valueand the field values which are required to extract a record containing afield value having that specific value is no more than the number ofpossible field values, namely the number of field value numbers, so thenumber of comparison operations is greatly decreased, thus speeding upsearching and tabulating. At this time, while a location is required tostore the results of determining whether or not a certain field valuematches, for example, the category number 12 can be used as this storagelocation.

FIG. 4 shows the information block according to the present inventionwhich comprises a value control table 10 including an array of fieldvalues 11 containing the field values, an array of category numbers 12containing the category numbers, and an array of counts 14 containingthe counts. The array of counts 14 contains numbers which indicate acount of the number of times each field value is present within all datain a certain field, or in other words, the number of records which havea stipulated field value. By preparing such an array of counts 14 withinthe value control table 10, the information “(how many instances of)which data is present?” required at the time of searching or tabulatingcan be obtained immediately, thus speeding up searching and tabulating.

FIG. 5 shows an information block including a value control table 10,array of pointers to the value control table 20 and an array of pointersto records 30. The array of pointers to records 30 is defined to be anarray containing, for each field value number, namely each field value,pointers to records that have that field value (corresponding to therecord number). The number of pointers contained in the array ofpointers to records 30 for each field value matches the number ofentries in the array of counts 14 in the value control table 10. Inaddition, an array of start positions 13 which specifies the startingaddress of a group of pointers for each field value may be providedwithin the array of pointers to records 30. By providing such an arrayof pointers to records 30 in the information block in this manner, a setof records that have a particular field value for a certain field can beextracted quickly. The count (reference number 14) and start positions(reference number 13) of pointers stored in the array of pointers torecords 30 are set in the value control table 10, so the fact that thevalues and count are present in the information block as is such thatthey are usable at the time of tabulating is an advantage.

Here follows an explanation of the method of searching for andtabulating table-format data according to the present invention. Notethat in the following explanation, the individual field informationrefers to the aforementioned “information block,” and the field valuenumber-specifying information array refers to the aforementioned “arrayof pointers to the value control table” while the record identifyinginformation array refers to the aforementioned “array of pointers torecords.”

When table-format data is represented as an array of records including aplurality of fields containing field values for each field, the methodof extracting from the table-format data the field value correspondingto a specific field and a specific record according to the presentinvention comprises the steps of:

keeping in a storage device, for each individual field, a value controltable containing field values which are located in order of field valuenumber each corresponding to the field value belonging to the specificfield, and a field value number-specifying information array containinginformation that specifies the field value numbers in the order of therecords,

acquiring from the field value number-specifying information array thefield value number corresponding to the specific record, and

obtaining from the field values stored in the value control table thefield value corresponding to the field value number acquired as above.

In addition, with the method of obtaining field values according to thepresent invention, in order to categorize the field values correspondingto the field value number, category numbers are stored in the valuecontrol table corresponding to the field value number, and the categorynumbers are accessed at the time of obtaining the field valuecorresponding to the field value number.

When table-format data is represented as an array of records includingfield values with respect to a field associated with a search condition,a single-search method of searching through said table-format data forfield values that match a specific search condition comprises the stepsof:

keeping in a storage device, for each individual field, individual fieldinformation such that includes a value control table containing fieldvalues which is located in order of field value numbers eachcorresponding to the field value belonging to the field associated withthe search condition, a field value number-specifying information arraycontaining information that specifies said field value numbers in theorder of said records, and a record identification information arraystoring in exclusive areas for each of said field value numbers one ormore pieces of record identification information related to identicalfield value numbers, and said value control table includes, for each ofsaid field value numbers, record identification information-specifyinginformation that indicates the area where said one or more pieces ofrecord identification information related to identical field valuenumbers in said record identification information array,

using said record identification information-specifying informationcorresponding to field value numbers related to field values within thefield values contained in said value control table that match saidsearch conditions, to acquire record identification information fromsaid record identification information array that matches said searchconditions.

In addition, the multiple-field search method according to the presentinvention comprises the steps of:

keeping in a storage device the result set of records that match thesearch conditions obtained by the single-field search method accordingto the present invention,

selecting separate individual field information regarding fields relatedto separate search conditions,

acquiring from the field value number-specifying information arrayregarding the separate individual field information the field valuenumber corresponding the pieces of record identification informationthat match the search conditions set in the result set,

regarding the separate individual field information, determining whetheror not the field values identified by the extracted field value numbersmatch the separate search conditions, and

regarding the separate individual field information, if the field valuesidentified by the extracted field value numbers match the separatesearch conditions, extracting the pieces of record identificationinformation corresponding to the field value numbers as pieces of recordidentification information that match the separate search conditions,and

Alternately, as a variation to the multiple-field search method, it ispossible to implement a so-called OR search. In more detail, this methodcomprises the steps of:

keeping in a storage device the result set of records that match thesearch conditions,

regarding other individual field information related to other searchconditions,

using field values within the field values stored in other value controltables that match the search conditions and record identificationinformation-specifying information corresponding to related field valuesto extract from a record identification information array the recordsthat match the other search conditions, and store the records that matchthe search conditions in a specified other record set, p1 if necessary,regarding still other search conditions, using still other recordidentification information-specifying information to extract recordsthat match still other search conditions, and repeating the storage ofstill other result sets, and

obtaining a final result set by eliminating duplicate records from theresult sets thus obtained.

When table-format data is represented as an array of records including aplurality of fields containing field values for each field, the methodof tabulating the table-format data by each field value according to thepresent invention comprises the steps of:

if n represents an integer equal to 1 or greater, for each of n fieldsused in tabulation, keeping in a storage device individual fieldinformation including a value control table containing field values forthat field corresponding to a field value number that uniquelyidentifies the field value, which is a field value number that is commonto the various fields and has a stipulated order from an initial value,and a field value number-specifying information array containinginformation that specifies the field value numbers in the order of therecords,

if i represents an integer in the range 1≦i≦n, for the i^(th) individualinformation field, the total number of the field value numbers isrepresented by N_(i), k_(i) represents an integer in the range0≦k_(i)≦N_(i)−1, M represents an integer equal to 1 or greater, and if mis an integer in the range 1≦m≦M, then initializing elements P_(m)(k₁,k₂, . . . , k_(i), . . . , k_(n)) of n-dimensional M data spaces havinga size of N_(1×N) ₂× . . . ×N_(i)× . . . ×N_(n),

for the n individual information fields, when j represents an integer inthe range 0≦j≦(total number of records)−1, extracting the respectivefield value numbers stored in the j^(th) position in each field valuenumber-specifying information array, and when the field value numberextracted from the i^(th) individual information field is represented byq_(i), identifying the elements P_(m(q) ₁, q₂, . . . , q_(i), . . . ,q_(n)) of the data space, and

processing the identified values of the elements P_(m)(q₁, q₂, . . . ,q_(i), . . . , q_(n)).

When table-format data is represented as an array of records including aplurality of fields containing field values for each field, the methodof tabulating the table-format data by the category of field values,

the method being characterized in comprising the steps of:

if n represents an integer equal to 1 or greater, for each of n fieldsused in tabulation, keeping in a storage device individual fieldinformation including a value control table containing field values forthat field and the category number of the field value corresponding to afield value number that uniquely identifies the field value, which is afield value number that is common to the various fields and has astipulated order from an initial value, and a field valuenumber-specifying information array containing information thatspecifies the field value numbers in the order of the records,

if i represents an integer in the range 1≦i≦n, for the i^(th) individualinformation field, the total number of either the field value numbers orthe category numbers is represented by N_(i), k_(i) represents aninteger in the range 0≦k_(i)≦N_(i)−1, M represents an integer equal to 1or greater, and if m is an integer in the range 1≦m≦M, then initializingelements P_(m)(k₁, k₂, . . . , k_(i), . . . , k_(n)) of n-dimensional Mdata spaces having a size of N₁×N₂× . . . ×N_(i)× . . . ×N_(n),

for the n individual information fields, when j represents an integer inthe range 0≦j≦(total number of records)−1, extracting the respectivefield value numbers stored in the j^(th) position in each field valuenumber-specifying information array, and when the field value numberextracted from the i^(th) individual information field or the categorynumber stored corresponding to the field value number in the valuecontrol table of the i^(th) individual information field is representedby q_(i), identifying the elements P_(m)(q₁, q₂, . . . , q_(i), . . . ,q_(n)) of the data space, and

processing the identified values of the elements P_(m)(q₁, q₂, . . . ,q_(i), . . . , q_(n)).

With the method of tabulating counts in particular according to thepresent invention, M=1 is true, and the step of processing the value ofthe identified element P_(m) includes adding 1 to the current value ofthe element P_(m).

In addition, with the method of tabulating statistical quantitiesaccording to the present invention, the step of processing the value ofthe identified element P_(m) comprises: for at least one element P_(m)among the M elements P_(m),

for separate individual field information kept in a storage device,acquiring the field value numbers stored in the j^(th) position in thefield value number-specifying information array,

from among the field values stored in the value control table of theseparate individual field information, acquiring the field valuecorresponding to the field value number thus acquired, and updating thecurrent value of the element P_(m) and the value of the element P_(m) incombination with the field value thus obtained.

With the present invention, the information that specifies the fieldvalue number may be the field value number itself.

Alternately, in order to implement the so-called multi-answer fieldswherein multiple field values are allocated to one field of a certainrecord, with the present invention, the information that specifies thefield value number may be a binary value wherein 1 bit is allocated toeach field value number, thus setting whether or not it is set.

In addition, when table-format data is represented as an array ofrecords including a plurality of fields containing field values for eachfield, the apparatus for searching for and tabulating the table-formatdata according to the present invention comprises:

a storage device for keeping, for each individual field, a value controltable containing field values for that field corresponding to a fieldvalue number that uniquely identifies the field value, which is a fieldvalue number that is common to the various fields and has a stipulatedorder from an initial value, and a field value number-specifyinginformation array containing information that specifies the field valuenumbers in the order of the records,

means of acquiring from the field value number-specifying informationarray kept on the storage device the field value number corresponding tothe specific record, and

means of obtaining from the field values stored in the value controltable kept on the storage device the field value corresponding to thefield value number acquired as above.

When table-format data is represented as an array of records including aplurality of fields containing field values for each field, the storagemedium upon which is recorded a program for searching for and tabulatingthe table-format data according to the present invention is recordedwith a program characterized in comprising:

a step of keeping in a storage device, for each individual field, avalue control table containing field values for that field correspondingto a field value number that uniquely identifies the field value, whichis a field value number that is common to the various fields and has astipulated order from an initial value, and a field valuenumber-specifying information array containing information thatspecifies the field value numbers in the order of the records,

a step of acquiring from the field value number-specifying informationarray kept on the storage device the field value number corresponding tothe specific record, and

a step of obtaining from the field values stored in the value controltable kept on the storage device the field value corresponding to thefield value number acquired as above.

The present invention also proposes a sorting method whereby an array ofrecord identification information, e.g. record numbers, specifyingrecords including a plurality of fields containing field valuescorresponding to fields of information is rearranged on a specificfield. With the sorting method according to the present invention, anarray of pointers to the value control table is formed wherein, for eachrecord, record identification information is associated with field valuenumber corresponding to the field values of a certain field. Next, foreach of the field value numbers, the storage location after reorderingsaid record identification information is defined. Said recordidentification information is sequentially extracted from the array, andsaid field value number corresponding to said record identificationinformation thus extracted is determined, the record identificationinformation thus extracted is stored in said storage location accordingto the record identification information-specifying informationcorresponding to the field value number thus determined, and the storagelocation where the record identification information is to be stored isupdated in order to store the next record identification information.

A preferred embodiment of the sorting method according to the presentinvention comprises the steps of keeping in a storage device individualfield information including a value control table containing fieldvalues in the order of field value numbers corresponding to field valuesfor a field value associated with search conditions, and a field valuenumber-specifying information array containing information thatspecifies field value numbers in the order of the records, where thevalue control table further includes record identificationinformation-specifying information that, for each field value number,indicates the area in said record identification information-specifyinginformation array where said one or more pieces of record identificationinformation regarding identical field value numbers are stored, and isconstituted such that, record identification information is stored atstorage locations according to the record identificationinformation-specifying information.

Moreover, the objects of the present invention may be achieved by anapparatus for implementing the aforementioned methods, acomputer-readable storage medium containing a program according to thismethod, or a computer-loadable program product according to the methodin question.

BRIEF EXPLANATION OF THE DRAWINGS

This and other objects of the present invention will be made clear inreference to the appended drawings and embodiments. Here,

FIG. 1 is an explanatory diagram illustrating typical table-format data.

FIGS. 2A and 2B are explanatory diagrams illustrating table-format datastorage techniques in the prior art.

FIG. 3 is an explanatory diagram illustrating the principle of thepresent invention.

FIG. 4 is an explanatory diagram illustrating an information blockaccording to the present invention.

FIG. 5 is an explanatory diagram illustrating an information blockaccording to the present invention.

FIG. 6 is an explanatory diagram illustrating an information blockregarding “sex” used in an embodiment of the present invention.

FIG. 7 is an explanatory diagram illustrating an information blockregarding “age” used in an embodiment of the present invention.

FIG. 8 is an explanatory diagram illustrating an information blockregarding “sex” used in an embodiment of the present invention.

FIG. 9 is a flowchart of the operation of the method of searching withina single field according to Embodiment 1 of the present invention.

FIG. 10 is an explanatory diagram illustrating an information blockaccording to Embodiment 1 of the present invention.

FIG. 11 is an explanatory diagram illustrating an information blockaccording to Embodiment 1 of the present invention.

FIG. 12 is a flowchart of the operation of the method of searching uponan AND of multiple fields according to Embodiment 2 of the presentinvention.

FIG. 13 is an explanatory diagram illustrating an information blockaccording to Embodiment 2 of the present invention.

FIG. 14 is an explanatory diagram illustrating an information blockaccording to Embodiment 2 of the present invention.

FIG. 15 is an explanatory diagram illustrating the method ofmultiple-field Boolean operation searching using bit flags according toEmbodiment 3 of the present invention.

FIG. 16 is an explanatory diagram illustrating the method ofmultiple-field Boolean operation searching using bit flags according toEmbodiment 3 of the present invention.

FIG. 17 is a flowchart of the operation of the method of tabulatingaccording to Embodiment 5 of the present invention.

FIG. 18 is a conceptual explanatory diagram of Embodiment 6 of thepresent invention.

FIG. 19 is a flowchart of the operation of Embodiment 6 of the presentinvention.

FIG. 20 is a flowchart of the operation of cross-tabulating according toEmbodiment 6 of the present invention.

FIG. 21 is an explanatory diagram illustrating an information blockaccording to Embodiment 8 of the present invention.

FIG. 22 is a flowchart of the operation of cross-tabulating according toEmbodiment 9 of the present invention.

FIGS. 23A and 23B are conceptual explanatory diagrams of across-tabulation table.

FIG. 24 is an explanatory diagram illustrating multi-answer type fields.

FIG. 25 is an explanatory diagram illustrating an information block of atype compatible with multi-answer type fields according to Embodiment 10of the present invention.

FIG. 26 is an explanatory diagram illustrating the method of handlingspecial values according to Embodiment 11 of the present invention.

FIG. 27 is a flowchart of the operation of the method of searching uponmultiple fields according to Embodiment 12 of the present invention.

FIG. 28 is a structural diagram of a searching and tabulating system fortable-format data based on one embodiment of the present invention.

FIG. 29 is an explanatory diagram illustrating the method ofconstructing an information block.

FIG. 30 is an explanatory diagram illustrating the preparation for datapopulation and initialization.

FIG. 31 is an explanatory diagram illustrating the first pass of datapopulation.

FIG. 32 is an explanatory diagram illustrating the second pass of datapopulation.

FIG. 33 is an explanatory diagram illustrating the third pass of datapopulation.

FIG. 34 is an explanatory diagram illustrating the third pass of datapopulation.

FIG. 35 is an explanatory diagram illustrating the third pass of datapopulation.

FIG. 36 is an explanatory diagram illustrating the addition of data toan information block.

FIG. 37 is a diagram illustrating the structure of an information blockaccording to another embodiment of the present invention.

FIG. 38 is an explanatory diagram illustrating the initial state ofsorting according to Embodiment 13 of the present invention.

FIG. 39 is an explanatory diagram illustrating the first step of sortingaccording to Embodiment 13 of the present invention.

FIG. 40 is an explanatory diagram illustrating the second step ofsorting according to Embodiment 13 of the present invention.

FIG. 41 is an explanatory diagram illustrating the final state ofsorting according to Embodiment 13 of the present invention.

FIG. 42 is an explanatory diagram illustrating sorting on a partial set.

FIG. 43 is an explanatory diagram illustrating the post-processing ofsorting on a partial set.

FIG. 44 is an explanatory diagram illustrating the 1 million records ofdata used in the searching and tabulating tests.

FIG. 45 is an explanatory diagram illustrating the results ofmeasurement of the searching and tabulating tests on 1 million recordsof data.

FIGS. 46A and 46B are flowcharts illustrating the OR search process onmultiple fields as a variation of Embodiment 2 of the present invention.

FIG. 47 is a flowchart illustrating the searching process according toEmbodiment 3 of the present invention.

FIG. 48 is a flowchart illustrating the tabulating process according toEmbodiment 4 of the present invention.

FIG. 49 is a flowchart illustrating the tabulating process according toEmbodiment 7 of the present invention.

FIG. 50 is a flowchart illustrating the sorting process according toEmbodiment 13 of the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENT

In order for the present invention to be better understood, we shall usethe table-format data illustrated in FIG. 1 as an example of data, andmake a detailed description of the search method, tabulating method andsorting method according to the present invention in variousembodiments. The data illustrated in the example of FIG. 1 includes thefields of “sex,” “age” and “occupation,” so as shown in the individualfigures in FIGS. 6-8, the information blocks obtained are an informationblock regarding “sex,” an information block regarding “age” and aninformation block regarding “occupation.” The following descriptionassumes a situation wherein these information blocks are obtained. Notethat while one technique of constructing the information blocks will bedescribed later, note that the present invention is in no way limited bythe method of constructing the information blocks.

As described later, the apparatus for searching for and tabulatingtable-format data according to an embodiment of the present invention isprovided with the structure shown in FIG. 28. As shown in FIG. 28, theapparatus for searching for and tabulating table-format data isimplemented by means of a computer system such as an ordinary personalcomputer. This computer system includes a CPU 100 that executes programsto control the entire system and its individual constituent components,ROM (Read Only Memory) 110 that stores programs and the like, RAM(Random Access Memory) 120 that stores working data and the like, a harddisk storage device 130, a display device 140, and a keyboard, mouse orother input device 150. The CPU 100, ROM 110, RAM 120, and the like areconnected to each other via a bus 160. Other components that may also beconnected to the bus include a CD-ROM drive (not shown) for accessingCD-ROM discs, an external network (not shown) and an interface (notshown) provided to connected external terminals, and the like.

The program that performs the searching and tabulating (and alsodepending on the case, sorting) of table-format data may be contained onCD-ROM (not shown) and read by a CD-ROM drive (not shown), or stored inadvance in ROM 110. In addition, once read from CD-ROM, the program mayalso be stored in a specific area of the hard disk storage device 130 .Alternately, the aforementioned program may also be supplied fromoutside via the network, external terminals or interface (none of theseare shown).

In addition, in the aforementioned search and tabulating apparatus, inorder to execute searching and tabulating (and also depending on thecase, sorting) processes on table-format data, as described later, it isnecessary to generate an information block of a stipulated data formatbased on the table-format data. This information block generationprogram may be similarly contained on CD-ROM, stored in ROM 110, orstored on the hard disk storage device 130. Alternately, theaforementioned programs may also be supplied from outside via thenetwork, external terminals or interface (none of these are shown). Inaddition, in this embodiment, the data (information blocks) generated bythe aforementioned information block generating program that generatesthe information blocks are stored in RAM 120 or in a specific area ofthe hard disk storage device 130. Here follows a description of themethod of searching on a single field according to Embodiment 1 of thepresent invention, in the case of searching for records wherein thevalue of the “age” field is “16” or “19.” FIG. 9 is a flowchart of theoperation of the method of searching within a single field. This isimplemented by the CPU 100 executing the search program acquired by theaforementioned procedure and stored in a stipulated area.

First, from among the information blocks regarding table-format data,select the information block regarding “age” shown in FIG. 7 as thespecific information block (Step 100).

Next, set “1” in the category number of those rows in which the fieldvalue within the value control table of the specific information blockmatches “16” or “19” which is the aforementioned search condition, andset “0” in the category number of other rows (Step 102). In the case ofthis example as shown in FIG. 10, “1” is set in the category number ofthose rows corresponding to a field value number of “0” and field valuenumber of “3.”

Next, the start positions and counts corresponding to the rows whereinthe category number is set to “1” (namely, the rows to which the fieldvalue numbers of “0” and “3” are applied) are acquired as pointerextraction information (Step 104). In the case of this example, thefield value number of “0” has a corresponding start position of “0” andcount of “45898.” On the other hand, the field value number of “3” has acorresponding start position of “238137” and count of “189653.”

By extracting from the array of pointers to records the number ofpointers specified by the aforementioned start position and count, therecord numbers that represents pointers to the records matching theaforementioned search conditions are extracted (Step 106). In the caseof this example, as shown in FIG. 10, one can see that the pointers torecords corresponding to the field value number of “0” are stored in thearray of pointers to records at locations from the start position of“0,” or namely the beginning, up until the 45898^(th) location, whilethe pointers to records corresponding to the field value number of “3”are stored in the array of pointers to records at 189653 locationsstarting from the 2383137^(th) location. For example, when accessing thetable-format data in FIG. 1, the “age” corresponding to the record withthe last record number of “999999” is “16,” so as shown in FIG. 11, thelast pointer among the stored pointers within the array of recordpointers which correspond to a field value number of “0,” or namely an“age” of “16,” is “999999.”

Finally, in order to be used in subsequent processing, an array of theextracted record numbers is created as a result set and saved (Step108).

With the present invention, it is possible to implement not onlysearches on a single field as described above, but also searches on anAND of multiple fields. Here follows a description of the method ofperforming searches on an AND of multiple fields according to Embodiment2 of the present invention. In this example, we shall consider the caseof obtaining a set of records that satisfy both the first searchcondition of the “age” being “16” or “19” and the second searchcondition of the “occupation” being “Student.” FIG. 12 is a flowchart ofthe operation of the method of searching upon an AND of multiple fields.

As described previously, for the first specific information block whichis the information block regarding “age” which is the first field, aresult set of records wherein the “age” is “16” or “19” is obtained bymeans of the processing according to Embodiment 1 (Step 120). Therefore,the processing of this Step 120 corresponds roughly to that shown inFIG. 9.

Next, the information block regarding “occupation” which is the secondfield shown in FIG. 8 is selected as the second specific informationblock (Step 122).

Next, set “1” in the category number of those rows in which the fieldvalue within the value control table of the specific information blockmatches “Student” which is the aforementioned search condition, and set“0” in the category number of other rows (Step 124). In the case of thisexample as shown in FIG. 13, “1” is set in the category number of thoserows corresponding to a field value number of “0,” and “0” is set inother rows.

Next, sequentially extract from the result set for the first searchcondition those record numbers that represent pointers to records (Step126). For example, as shown in FIG. 14, the record number “999999” isextracted.

Next, regarding the second specific information block, extract from thearray of pointers to the value control table the field value numberscorresponding to the record number obtained with respect to theaforementioned first search condition (Step 128). For example, as shownin FIG. 14, the field value number of “0” corresponding to the recordnumber of “999999” is extracted.

Next, a decision is made as to whether or not “1” is set in the categorynumber corresponding to the field value number extracted with respect tothe second specific information block (Step 130). For example, as shownin FIG. 14, one can see that “1” is set in the category numbercorresponding to the field value number of “0.”

In the case that “1” is set in the category number, add pointers torecords corresponding to locations within the array of pointers to thevalue control table where pointers indicating the field value number inquestion where “1” is set in the category number, for example, recordnumbers, to the final result set (Step 132). For example, as shown inFIG. 14, the record number “999999” is added to the final result set.

In the case that the category number is “0” then the final result set isnot updated.

Note that as would be easily understandable to a person skilled in theart, the aforementioned method of searching upon an AND of multiplefields can be applied to searches other than AND searches, so suchvariations as a method of searching upon an OR of multiple fields, forexample, would be possible. FIG. 46A is a flowchart illustrating oneexample of the processing of an OR search process on multiple fields.This process is also implemented by the CPU 100 executing a programstored in a stipulated area. As shown in FIG. 46, first, after theresult set is obtained with respect to the first search condition (Step4601), an information block for the second search condition is selected(Step 4602). Next, regarding this information block, a category numberis set with respect to the second search condition (Step 4603). Whileskipping record numbers contained in the result set from the firstsearch condition, the array of pointers to the value control table isscanned sequentially with respect to the second specific informationblock (Step 4604). In more detail, the record numbers wherein thecategory number was made “1” regarding the second search condition, anda decision is made as to whether or not this record number was foundwithin the result set according to the first search condition (Steps4611-4615). If the number is not found within the result set accordingto the first search condition, that number is added to the result set(Step 4614). After this process is complete, a second result set isgenerated by combining the record numbers stored in the result set fromthe first search condition and the record numbers belonging to the fieldvalue numbers for which the category number is set with respect to thesecond information block (Step 4615), and this can be provided asoutput.

Alternately, the process shown in FIG. 46B may be executed. In thisexample, after the first result set is obtained based on the searchconditions regarding the first specific information, independentlythereof, a second result set is obtained based on the second searchconditions regarding the second specific information (Steps 4621-4624),and an OR of the first result set and second result set (Step 4625) isperformed using a bitmap (Step 4626), and a new result set is createdbased on this (Step 4627). Note that in the process of FIG. 46B, steps4602 and 4603 correspond to Steps 4621 and 4622 of FIG. 46A, and step4625 corresponds to Step 4601 of FIG. 46A.

FIGS. 15 and 16 are explanatory diagrams illustrating the method ofmultiple-field Boolean operation searching using bit flags according toEmbodiment 3 of the present invention, illustrating the case ofperforming a search under the same search conditions as the searchaccording to the aforementioned Embodiment 2 of the present invention.Multiple-field Boolean operation searching using bit flags is defined tomean a search wherein the search conditions are expressed by a Booleanoperation among search conditions for each field. In this case, as inEmbodiment 1, a result set obtained by means of a search on a singlefield should not be constructed of an array of record numbers but ratherit is more advantageous for the result set to be constructed in the formof bit flags. Namely, in accordance with the process illustrated in FIG.47, the result set is generated by allocating one bit each to all of therecords, and a bit value of “1” or “0” expresses whether or not eachrecord matches the search conditions. In more detail, in the same manneras in the other embodiments, an information block containing fieldvalues pertaining to the search condition is selected (Step 4701), andthen the category number is set to “1” on rows that match the searchconditions (Step 4702). Next, the corresponding category number isaccessed for each record and the bit value to be stored in the resultset is determined (Steps 4703-4707). By forming the result set in thismanner, the size of the result set for each field corresponds to thenumber of records in the table-format data, so the size of the resultset is identical for each field, and as a result, it is simple toperform Boolean operations, e.g., AND, OR and XOR, on elements in theresult set.

In this example, the result set A shown in FIG. 15 and the result set Bshown in FIG. 16 are joined under AND conditions to obtain the desiredsearch result set in bit flag format. In addition, the search result setin bit flag format thus obtained can be converted to a result set in theformat of an array of pointers to records, and thus combined with theaforementioned method of searching on multiple fields according toEmbodiment 2 of the present invention.

Next, we shall add an explanation of the method of tabulating varioustypes of table-format data according to the present invention. Thetabulating method according to Embodiment 4 of the present inventioncomprises counting the number of records that have a specific fieldvalue in a specific field. In Embodiment 4 of the present invention, weshall consider the case of counting the number of records that have thefield value of “Male” or that have the field value of “Female” in the“sex” field. As illustrated in FIG. 6, according to a preferredembodiment of the present invention, the information block regarding“sex” contains a count of the records that contain the field value of“Male” (its value being “632564”) and a count of the records thatcontain the field value of “Female” (its value being “367436”), so asimple tabulation of the number of records can be obtained immediatelyby accessing the array of counts within the information block.

In addition, by combining the category numbers described regarding themethod of searching on single fields with the method of tabulatingaccording to the aforementioned Embodiment 4 of the present invention,counting the number of records can be performed easily even in the caseof more complicated conditions. For example, in the method of searchingfor records in which the value of the “age” field is “16” or “19”described in Embodiment 1 of the present invention, it is possible totabulate the counts corresponding to field value numbers in which “1” isset as the category number, and thus tabulate the number of recordsmatching the search conditions. In this manner, by using the categorynumber, even in the case that the value control table is of a large sizeor in the case that complex conditions are given, the count can be foundefficiently. More generally, as shown in FIG. 48, it is sufficient tofind the field value numbers in which “1” is set in the category number(Step 4802) and then add the corresponding counts (Step 4803).

Next, we shall provide additional explanation of Embodiment 5 of thepresent invention. In this embodiment, we shall calculate the averageage of the “Males.” The average age can be calculated by the formula(total “age” of the males)/(count of males), and also the count of malescan be obtained by the tabulation method described in the aforementionedEmbodiment 4 of the present invention. Therefore, in this embodiment, itcomes down to the problem of finding the total “age” of the males. FIG.17 is a flowchart illustrating the operation of Embodiment 5 of thepresent invention. In the same manner as in other embodiments, thisprocess is also implemented by the CPU 100 executing a program stored ina stipulated area.

First, the information block regarding “sex” as shown in FIG. 6 isselected as the first information block (Step 140), and the field valuenumbers of “0” corresponding to the field value of “Male” are detectedfrom within the value control table of the specific information block(Step 142). Next, the count corresponding to the field value number of“0” is “632564” so the total number of males is determined to be 632564,and also, the start position corresponding to the field value number of“0” is “0” so the pointers to records wherein the sex is male aredetermined to be stored in the locations starting from the beginninguntil the 632564^(th) location, and thus a list of the pointers to theserecords, namely, an array of record numbers is kept as the result set(Step 146).

Next, the information block regarding “age” illustrated in FIG. 7 isselected as the second specific information block (Step 148) and fromthe array of pointers to the value control table of the second specificinformation block, the field value number corresponding to the recordspecified in the result set regarding the first specific informationblock is extracted (Step 150), and the field value related to theextracted field value number, namely the “age” is extracted (Step 152).Finally, find the total age by sequentially adding the extracted “age”values (Step 154), and repeat steps 150, 152 and 154 until all of thespecified records in the aforementioned result set are processed (Step156). The total age thus obtained is divided by the count to find theaverage age (Step 158).

Next, we shall add an explanation of Embodiment 6 of the presentinvention. In this example, we find the average age of the male studentsand the average age of female students. FIG. 18 is a conceptualexplanatory diagram of Embodiment 6 of the present invention, while FIG.19 is a flowchart of the operation of Embodiment 6 of the presentinvention.

In this embodiment, the tabulation is performed by first selecting theinformation block regarding “occupation” as the first information block(Step 170), and using the search condition of “occupation is student” tocreate from among all records a result set containing the recordswherein the “occupation is student” (Step 172).

Next, select the information block regarding “sex” as the secondinformation block and also select the information block regarding “age”as the third information block (Step 174), and sequentially extractpointers to records from the beginning of the result set (Step 176).

Using the extracted pointers to records, the array of pointers to thevalue control table of the second information block is accessed to getthe sex corresponding to the extracted pointers to records, and also,the array of pointers to the value control table of the thirdinformation block is accessed to extract the age corresponding to theextracted pointers to records (Step 178). Totals for both male andfemale sexes are incremented by 1 for each extraction, to calculate thetotal extracted age for both males and females (Step 180).

A check is made as to whether or not all pointers to records of theresult set have been processed (Step 182), and if all pointers torecords have been processed, the average ages for both male and femalestudents is calculated by dividing the total ages for males and femalesby the total number (Step 184).

Next, we shall add an explanation of Embodiment 7 of the presentinvention made with reference to the flowchart of operation illustratedin FIG. 20. In this embodiment, the so-called cross-tabulation isimplemented. Note that the processing program shown in FIG. 20 is alsoread and executed by the CPU 100. In this embodiment, we shall considerthe case of finding counts by sex/by occupation taking the entire set ofrecords regarding the table-format data of FIG. 1.

Regarding the two fields of “sex” and “occupation” used in tabulation,the respective value control tables and the field value numberspecifying information array, namely, the array of pointers to the valuecontrol table, which express two pieces of individual field information,namely the first and second information blocks are kept in a storagedevice (Step 190). The memory device may be implemented in the form of,for example, memory, virtual storage, memory-mapped file or the like.

Regarding the first information block regarding sex, as shown in FIG. 6,the total number of field value numbers for sex is “2,” and regardingthe second information block regarding occupation, as shown in FIG. 8,one can see that the total number of field value numbers for occupationis “4.” Thus, an initialized 2×4 (2 row by 4 column) two-dimensionalarray is generated as the space to store tabulation data (Step 192).

Regarding the first and second information blocks, the field valuenumbers q₁ and q₂, respectively, are extracted sequentially from thebeginning of the array of pointers to the value control table, and theseare used to identify a single element P(q₁, q₂) in the two-dimensionalarray (Step 194), and then the value of the identified element P(q₁, q₂)is incremented by 1 (Step 196).

A check is made as to whether or not all field value numbers (namely, anumber of field value numbers equal to the total number of records) havebeen extracted from the array of pointers to the value control table(Step 198), and if field value numbers still remain, then return to Step194, but if not, terminate the tabulation process. This completes thetwo-dimensional cross table.

In the aforementioned Embodiment 7 of the present invention, tabulationis performed on the entire set of records in the table-format data ofFIG. 1, but it is also possible to perform the same type of tabulationon a partial set of records, for example, tabulating a count of 16 yearolds by sex/occupation. In order to do this, a single-field search isfirst performed using the age of 16 as the search condition, and thenthe identifying information for records that match an age of 16 isacquired and kept. Next, as described previously with regard to searcheson an AND of multiple fields, it is sufficient to take the recordscontained in this result set, namely the partial set of records as theobject of the operation, and sequentially extract record identifyinginformation contained in the result set starting from the beginning, andthen extract the field value numbers corresponding to the recordidentifying information from the array of pointers to the value controltable, and take the field value numbers thus obtained as the row andcolumn coordinates of a two-dimensional array while incrementing thecount of element values by “1.” Namely, in this process, as shown inFIG. 49, a process roughly equivalent to that shown in FIG. 9 isexecuted to generate a result set for containing pointers to records(Step 4901). Next, this result set is used to execute a process roughlyequivalent to that of FIG. 20. Here, first, in the same manner as inFIG. 20, the first and second information blocks regarding the fieldsused in tabulation (Step 4902), and next, an initialized two-dimensionalarray is generated (Step 4903).

Thereafter, a number that indicates the storage position of pointers inthe result set (hereinafter referred to as the “storage position number”depending on the case) is initialized (Step 4904). In accordance withthe program, regarding the various storage position numbers, the CPU 100extracts an array of pointers corresponding to the value control tablein the first and second information block and identifies an elementP(q₁, q₂) within the two-dimensional (Step 4905), and next, the P(q₁,q₂) is incremented (Step 4906). By executing this process with respectto all storage position numbers (namely, by processing theaforementioned field value numbers within the result set), it ispossible to achieve cross-tabulation regarding a partial set obtained bysearching or the like.

Next, we shall add an explanation of Embodiment 8 of the presentinvention. In this example, cross-tabulation is implemented in thesituation wherein the field values of the field are divided into severalcategories, by counting the counts for each category of field value. Forexample, referring to the information block regarding “occupation” shownin FIG. 8, one sees that the four field values of “student,”“programmer,” “teacher” and “other” are registered for “occupation.” Asthe categories based on these field values, one can envision the case ofrecategorization into the three types of “income earner,” “non-incomeearner” and “unknown.” In this example, in this situation, a newcategory of “presence of income” is created to create a cross-tabulationof counts depending on sex/presence of income.

The information block regarding “occupation” shown in FIG. 21 includes avalue control table wherein category numbers are applied to each fieldvalue number based on the “presence of income” in particular. In thisexample, students are assigned a category number of “1” (non-incomeearner), while programmers and students are assigned a category numberof “0” (income earner), and “other” is assigned a category number of “2”(unknown).

As a general rule, the cross-tabulation in Embodiment 8 of the presentinvention has a process sequence roughly the same as that of thecross-tabulation in Embodiment 7, but it differs in the point that ituses as the coordinates that specify the element of the two-dimensionalarray to store the tabulation data, the field value number of the firstinformation block regarding sex and the field value number of the secondinformation block regarding occupation.

Since any of the field value numbers or category numbers can be used asthe coordinates of elements in the two-dimensional array, in Embodiment8 of the present invention, with respect to the first and secondinformation blocks, the respective field value numbers stored in eacharray of pointers to the value control table are extracted sequentially,and the coordinates of the element P in the two-dimensional array isidentified based on either the field value number itself extracted fromthe array of pointers to the value control table or the category numberstored in the value control table corresponding to the field valuenumber.

In the example described above, the information block on “sex” is usedas the first information block, and the information block on“occupation” is used as the second information block (see Step 190 ofFIG. 20). In the next processing step (Step 191), since the informationblock on “sex” contains two field value numbers and the informationblock on “occupation” contains three category numbers, an initialized2×3 (2 row by 3 column) two-dimensional array is generated.

In the subsequent processing steps also, the field value number q₁ ofthe first block and the category number q₂ of the second block areextracted, so these are used to identify a single element P(q₁, q₂) andthen the value of the element P is incremented (see Steps 194 and 196).

While the cross-tabulation according to Embodiments 7 and 8 describedabove is particularly tabulation in the form of finding counts, but notethat the present invention may also be expanded to cross-tabulationwherein an average age is found depending on multiple fields (e.g., bysex/by occupation). In Embodiment 9, cross-tabulation of theaforementioned type is performed.

In order for the tabulation method of a type wherein counts are found byincrementing the elements of a two-dimensional array one at a time, asin the aforementioned Embodiment 7 of the present invention, to beexpanded to cross-tabulation of a type that requires operations otherthan the totaling of counts such as finding an average age, thevariation shown in the operation flow chart in FIG. 22 is adopted.

To wit, according to Embodiment 9 of the present invention, 2two-dimensional arrays are used for tabulation, so regarding the firsttwo-dimensional array, the counts by sex/by occupation are counted inthe same manner as the aforementioned Embodiment 7, and regarding thesecond two-dimensional array, the total age by sex/by occupation arecalculated.

Here follows a more detailed explanation.

First, a first, second and third information block for the three fieldsof sex, occupation and age are loaded into the storage device (Step200).

Corresponding to the total number of field value numbers for sex andoccupation of “2” and “4” respectively, an initialized 2×4 (2 row by 4column) two-dimensional array for storing tabulation data is created(Step 202).

Starting from the beginning of the array of pointers to the valuecontrol table of the first and second information blocks, the fieldvalue numbers q₁, q₂ and q₃ are extracted sequentially to identify thecoordinates (q₁, q₂) of an element of the two-dimensional array (Step204) and then the value of the element P₁(q₁, q₂) of the firsttwo-dimensional array thus identified is incremented by 1 each (Steps206).

Moreover, with respect to the information block regarding “age,” thefield value corresponding to the field value number of q₃ (namely, theage) is acquired (Step 208), and the acquired age is added to theelement P₂(q₁, q₂) of the identified second two-dimensional array (Step210).

After this processing, a check is made as to whether or not all subjectrecords have been processed (Step 212), and if not, then control returnsto Step 204, but if so, then the operation P₂(q₁, q₂)/P₁(q₁, q₂) isperformed among the various elements of two-dimensional array P₁ andtwo-dimensional array P₂ (Step 206). Thereby, the average age by sex/byoccupation is obtained and a cross-tabulation table of averages iscreated.

FIG. 23A is a conceptual explanatory diagram of a cross-tabulation tableobtained in the aforementioned Embodiment 7 of the present invention. Inthis manner, in the aforementioned Embodiment 7, counts for allcombinations of sex/occupation are tabulated. However, as shown in FIG.23B, among the by sex/by occupation categories, there may be caseswherein one wishes to know in particular the count of only those personshaving a sex of female and occupation of student. By means of thepresent invention, the count in this case is obtained by finding thesize of the result set from a search of an AND of the multiple fields of“female” AND “student.”

Similarly, in the aforementioned Embodiment 9 of the present invention,a cross-tabulation table of the average age is found for allcombinations of sex/occupation, but it is also possible to find inparticular the average age of only those persons having a sex of femaleand occupation of student. In this case, the count is found from thesize of the result set from a search of an AND of the multiple fields of“female” AND “student,” and the total of ages is found by adding theages belonging to records specified by the identifying information forrecords contained in the result set, and by calculating the fraction(total of ages)/(count), it is possible to find the desired value (e.g.,the average age) regarding a specific cell in the cross-tabulation tablefor average age.

Next, we shall add an explanation of Embodiment 10 of the presentinvention. FIG. 24 is a diagram illustrating multi-answer type fields,while FIG. 25 is an explanatory diagram of an information block of atype compatible with multi-answer type fields according to Embodiment 10of the present invention. “Multi-answer” refers to the situationwherein, for example, when answers are obtained to the question “Whatkinds of writing implements are now on the table?” then multiple answerssuch as “pencil, eraser” or “paper, pencil” are obtained from the sameperson. To wit, in the case of multi-answer, it is possible to specifymultiple field values for a single field of a single record. FIG. 24shows a list of the responses to the aforementioned question obtainedfrom 1 million people, given as is.

In order to process such data, by means of Embodiment 10 of the presentinvention, as shown in FIG. 25, the array of pointers to the valuecontrol table of the information block differs from the array of fieldvalue numbers itself as described above, but rather 1 bit is allocatedto each field value number in the pointers in the array. Therefore, itis possible to indicate whether or not a record specifies that fieldvalue number by means of turning bits on/off (namely, a binary number).Thereby, it is possible to specify multiple field values contained in asingle field in a single record. For example, in FIG. 25, the pointers(bit pointers) within the array of pointers are 4-bit in size, and whenthe highest bit is on (namely, “1”), this means that the response of“paper” is included, when the second bit is on the response of “ruler”is included, and when the third bit is on this means the response of“eraser” is included. Moreover, when the lowest bit is on, this meansthe response of “pencil” is included.

The pointer corresponding to record number “0” has the value “3.” Thiscan be considered to be “2¹+2⁰.” Therefore, this can be understood asthe responses of “pencil” and “eraser” being included corresponding tothis record number. In addition, the pointers corresponding to recordnumber “1” and record number “2” have the values “4” and “10,”respectively, and these can be considered to be “2²” and “2 ³+2¹.”Therefore, one can thus know that the responses corresponding to theserecord numbers include “ruler” along with “eraser” and “paper,”respectively.

By means of this embodiment, each bit in the pointer value is givenmeaning so a plurality of field value numbers can be indicated.Therefore, even in the case in which a record has a plurality of fieldvalues, this can be expressed by means of the pointer value.

Note that the present invention has an advantage in that it can beeasily adapted to a multi-answer situation by simply modifying theconstitution of one portion of the information block. In fact, aninformation block thus modified can be used to replace the informationblocks adopted in the various aforementioned embodiments of the presentinvention.

Next, we shall add an explanation of Embodiment 11 of the presentinvention. FIG. 26 is an explanatory diagram illustrating the method ofhandling blanks, error values and other special values that occur duringtabulation processing, according to Embodiment 11 of the presentinvention. As shown in this figure, in Embodiment 11, cross-tabulationis executed by taking blanks to be one category. When handling actualdata, there may be cases in which blanks or log(−1) or othermathematical errors appear. With the present invention, even if suchspecial values (blanks, errors, etc.) are present, it has the advantagein that they are registered in the value control table as field values,and the registered special values can be used as is as categories forsearches or tabulation.

Next, we shall add an explanation of Embodiment 12 of the presentinvention. We shall describe delay evaluation using the flowchart of theoperation of the method of searching upon multiple fields according toEmbodiment 12 of the present invention shown in FIG. 27. In thisembodiment, in the same manner as in Embodiment 2 of the presentinvention, we shall consider the case of obtaining a set of records thatsatisfy both the first search condition of the “age” being “16” or “19”and the second search condition of the “occupation” being “Student.”

In the aforementioned Embodiment 2 of the present invention, thecategory numbers for all records are set in advance for all records(Step 124 of FIG. 12), but in the case of Embodiment 12, the setting ofthe category numbers is performed for only the category numberscorresponding to the field value numbers actually accessed based on theresult set from the search on the first search condition.

As described previously, from the first specific information block whichis the information block regarding “age” which is the first field, aresult set of records wherein the “age” is “16” or “19” is obtainedaccording to Embodiment 1 of the present invention (Step 220).

Next, the second specific information block which is the informationblock regarding “occupation” which is the second field shown in FIG. 8is selected (Step 222), and the value of all category numbers in thevalue control table of the second specific information block isinitialized to “−1” for example (Step 224).

Next, from the result set regarding the first search condition, recordnumbers that represent pointers to records are extracted sequentially(Step 226). In the case of this example, as shown in FIG. 14, the recordnumber “999999” is extracted, for example.

Next, regarding the second specific information block, extract from thearray of pointers to the value control table the field value numbercorresponding to the record number obtained under for the aforementionedfirst search condition (Step 228). In the case of this example, as shownin FIG. 14, the field value number of “0” corresponding to the recordnumber of “999999” is extracted, for example.

Next, a check is made as to whether the value of the category numbercorresponding to the field value number extracted with respect to thesecond specific information block is “−1” or not (Step 230).

In the case that the category number is “−1,” this means that thecategory number has not yet been set for that field value number, so adecision is made as to whether or not the field value corresponding tothis field value number matches the aforementioned second searchcondition (Step 232), and if it matches then the category number is setto “1” (Step 234), but if it does not match, then the category number isset to “0” (Step 236).

In the case that the category number is not “−1,” then a decision ismade as to whether or not the value of the category number correspondingto the field value number extracted above is set to “1” (Step 238). Ifthe value of the category number is set to “1,” then add to the finalresult set a pointer to the record, e.g. the record number,corresponding to the location within the array of pointers to the valuecontrol table at which is stored a pointer which indicates the fieldvalue number in which the category number is set to “1” (Step 240). Inthis example, as shown in FIG. 14, record number “999999” for example,is added to the final result set. If the category number is “0” then thefinal result set is not updated.

The delay evaluation as shown in the aforementioned Embodiment 12 iseffective in the following types of cases. For example, consider thecase in which a customer database of 1 million people exists and onewishes to implement a telephone survey, and thus extract a sample of 100people. For example, when the people are narrowed down to those whosatisfy stipulated conditions (sex, age, occupation, etc.) one can comeup with 10,000 people, and then in order to ensure randomness, a searchis performed based on the numbers (e.g., “12”) at the end of thetelephone number.

In this case, in Embodiment 12, first the elements of the “categorynumber array” are filled with “−1” to evaluate only the aforementionedset of 10,000 people. Namely, for the result set of a size of 10,000people, the elements of the category number array are accessed and ifthe element is “−1” then and only then the telephone number is accessedand the results of the access are given as elements of the “categorynumber array.” Thereby, it is possible to keep the number of checks downto 10,000. In this manner, by means of Embodiment 12, it is possible toreduce the number of processing steps greatly in comparison to anordinary AND search.

In addition, by using the information block according to the presentinvention, data that has a structure like that of a telephone numberconsisting of the “country code+area code+central office code+number”can be divided and registered in multiple information blocks, and thishas an advantage in that searching and tabulation regarding a countrycode, area code or other partial data can be performed easily.

In addition, by using the category number according to the embodimentsof the present invention as described previously, it is possible togenerate new categories for ages, for example, by taking the ages“10-19” to be the “tens,” the ages “20-29” to be the “twenties” and soon, and methods of searching and tabulating similar to those describedabove can be applied to the new categories thus generated.

As described above, the apparatus for implementing searching andtabulating according to embodiments of the present invention isimplemented by means of an ordinary computer system shown in FIG. 28,for example a personal computer including a CPU 100, ROM 110, RAM 120, ahard disk 130, a display 140 or other output device, and akeyboard/mouse 150 or other input device 150 connected to each other viaa bus 160. Therefore, as described above, the program for constructingthe information block for implementing the aforementioned embodiment(information block generating program) may also be recorded on CD-ROM,ROM 110 or the hard disk storage device 130, or may be supplied fromoutside via a network (not shown).

With reference to the flowchart shown in FIG. 29, here follows oneexample of the method of constructing an information block of the formatshown in FIG. 5 for the table-format data shown in FIG. 2B.

Step 300: Data Preparation

First, data of the format shown in FIG. 2B is prepared. Next, this isdivided by field. In FIG. 2B, it can be divided into the fields of“sex,” “age” and “occupation.”

Step 311: Generation of the Information Block for the “Sex” Field

Generate one information block and this is to be the information blockfor the “sex” field, for example.

Step 312: Generation of the Value Control Table

Next, initialize the value control table and scan the “sex” field datafrom the beginning to end, while counting the number of instances ofeach field and storing this data. In the case of this example, up untilthis step, the field values of “female” and “male” are set in the arrayof field values 11 of the aforementioned value control table, while thevalues of “367436” and “632564” respectively are set in the array ofcounts 14 of the aforementioned value control table corresponding to theaforementioned field values.

Next, the field values (“female” and “male”) within the array of fieldvalues 11 are sorted according to a stipulated basis. Naturally, at thetime of this sort, the array of counts 14 must also be reordered withthe sorting of the array of field values 11 .

Moreover, set the start position in the array of start positions 13 ofthe value control table. This start position is found as the cumulativetotal of counts corresponding to the start position from the first countin the array of counts 14 within the value control table. Naturally, thevalue of the first start position is “0.”

Next, copy the content of the array of start positions 13 to the arrayof category numbers 12. The array of category numbers 12 is used lateras a work area at the time of creating the array of pointers to records.

Step 313: Creation of the Array of Pointers to the Value Control Table

Next, allocate a storage area for the array of pointers to the valuecontrol table 20 (The size of the storage area is the total count in theaforementioned array of counts 14.)

Next, extract one field value at a time from the “sex” field data fromthe beginning to the end, examine each field value to see if it matchesthe field value at each entry of the value control table, and if itmatches the n^(th) field value, then set “n−1” as a pointer to the valuecontrol table in the aforementioned array of pointers to the valuecontrol table.

Step 314: Creation of the Array of Pointers to Records

Next, allocate a storage area for the array of pointers to records 30.In this example, the size of the storage area is the total of the countsin the aforementioned array of counts 14. In the array of pointers tothe value control table 20, from the beginning row to the ending row,extract one pointer to the value control table at a time. Extract theJ^(th) value of the array of pointers to the value control table 20, andassuming its value is “K” then extract the category number correspondingto the K+1^(th) record of the value control table, and assuming itsvalue is “L” then store “J-1” in the L+1^(th) element of the array ofpointers to records 30, and increment by 1 the category numbercorresponding to the K+1^(th) record of the value control table.

The aforementioned operation completes the creation of the informationblock for the “sex” field (Step 310). Information blocks for the “age”field and “occupation” field can be created in the same manner (Step 320and Step 330), and thus information blocks for the entire table-formatdata are obtained.

FIGS. 30 through 35 are explanatory diagrams for the procedure ofcreating the information block regarding “occupation” in thetable-format data shown in FIG. 1.

FIG. 30 is a diagram illustrating populating with new data in the casein which categories are already defined and the types of attributevalues are known in advance. Here, the value control table is createdaccording to known category definitions. Since the start positions andcounts are unknown, these are initialized to “0.” In addition, storageareas are allocated for the array of pointers to the value control tableand array of pointers to records and these are similarly initialized.

FIG. 31 shows the pass in which the array of pointers to the valuecontrol table and the counts in the value control table are completed.The data to be populated is taken one item at a time starting from thebeginning and its value examined as to which item (namely, which fieldvalue number) in the value control table it matches, and then it isstored in the array of pointers to the value control table and thecorresponding count in the value control table is updated by “+1” at atime. The example of FIG. 31 shows the state after the processing of thesecond item of data to be populated is complete.

FIG. 32 shows the second pass for completing the value control table.The accumulation of counts uses the correspondence to the start positionto find the start positions. Moreover, the value of the start positionis copied to the category number. In the figure, the setting of thecategory number is complete.

FIGS. 33-35 show the third pass of data population. In this pass, onevalue (pointer) at a time is taken from the beginning of the array ofpointers to the value control table, and the offset in the array ofpointers to the value control table, namely the record number, is storedat the position in the array of pointers to records specified by thecategory number within the value control table referenced by that value.FIGS. 33, 34 and 35 respectively show the processing of the first,second and last pieces of data of the array of pointers to the valuecontrol table of the information block regarding “occupation.”

Note that in the aforementioned explanation, the field of the categorynumber is used as a work area, but any array that is an array ofintegers with a number of elements equal to or greater than the numberof rows in the value control table, namely the total number of fieldvalue numbers, can also be used as the work area.

On the other hand, the population with new data in the case thatcategories are not defined in advance is implemented by scanning thedata to be populated and acquiring a list of values to be registered inthe value control table and then, performing the aforementioned processof population with new data in the case that the categories are defined.

Next, consider the case of adding an additional “Student” record to theinformation block regarding “occupation” after the population with newdata is complete as shown in FIG. 35. FIG. 36 is an explanatory diagramfor this addition of a record.

In this case, the field value number 0 which indicates “Student” isadded to the end of the array of pointers to the value control table,and then the count of students in the value control table is increasedby “1.” Next, it is necessary to allocate space for storing the recordnumber, namely the value of the pointer to the value control table(=1000000), within the array of pointers to records. To this end, thevalue at the end of the array of pointers to records corresponding to“Student” (in this example, 999999) is extracted and “1000000,” which isthe expansion address, is stored. However, its sign may be reversed, forexample, in order to identify this as an expansion address, and storedas “−1000000.” Then, the end value of “999999” which was extractedpreviously as the expansion is stored, and finally, the pointer value of“1000000” corresponding to the newly added record is stored.

By adopting this method of adding data, the need to move large amountsof data at the time of adding data is avoided. In addition, in order tosuppress the drop in access efficiency arising from the increase in thenumber of pointers when large numbers of records are added, it issufficient to repeat the same processing as the third pass of thepopulation with new data at appropriate timing.

FIG. 37 illustrates the structure of an information block according toanother embodiment of the present invention. In the case that thestructure shown in this figure is adopted, increases in the number ofpointers are avoided and changes to data can be performed easily. InFIG. 37, the array of starting locations contains addresses thatindicate the beginning of the area where the array of pointers torecords is disposed. For example, “0” is stored as the start positionfor the field value of “Student.” On the other hand, for the field valueof “Programmer,” the value of “n (where n>455214)” is allocated as thestart position.

Next, we shall add an explanation of Embodiment 13 of the presentinvention. In this embodiment, a sort of records is implemented usingthe aforementioned information block. FIG. 38 illustrates the initialstate of sorting records on the “occupation” field. The raw data shownin this figure shows the array of record numbers to be sorted. Forexample, an array of pointers to records obtained for fields other than“occupation,” or a result set from a search can be used as the raw data.In the case of this example, for simplicity in the explanation, therecord numbers of the raw data are arranged in the order “0, 1, 2, . . ., 9” but one must note that the order of record numbers prior to thesort will generally be random. The field values in the “occupation”field corresponding to each record number are arranged in the order“Teacher, Programmer, Student, . . . , Other.”

On the right side of this figure, the initial state of the variousarrays contained in the information block regarding “occupation” isshown. The information block regarding “occupation” is created by theinformation block construction method explained with reference to FIGS.29-35. At the time of performing a sort of records on the “occupation”field, the value control table and array of pointers to the valuecontrol table of an information block regarding “occupation” prepared inadvance is used. As the start position, the start positions set at thetime of construction of the information block are used as is. The startpositions are copied to the corresponding end positions. The area tocontain the end position may be, for example, the area allocated for thecount (the count array). The array of pointers to the value controltable may be prepared in advance in record number order, for example. Inthe case of this example, the array of record numbers of the raw dataare in descending order, so the connection between the record numbers ofthe raw data and the array of pointers to the value control tableexhibits a simple relationship. In addition, the array of pointers tothe value control table is an array for storing the sorted result set,so an area of the same size as the data to be sorted is allocated. Theaforementioned end positions are used as an array for storing the sortedresults in the array of pointers to records.

FIG. 39 is an explanatory diagram illustrating the first step of sortingaccording to Embodiment 13 of the present invention. In the first step,the beginning record in the raw data (in this example, the one withrecord number=0) is processed. The field value of the “occupation” fieldof the record with record number “0” is “Teacher.” At this time, thefield value number “2” which specifies a field value of “Teacher” isstored in the array of pointers to the value control table correspondingto record number “0.” Then, the value “5” of the end positioncorresponding to the field value number of “2” is extracted and thisvalue of “5” is used as an address to set this record number “0” in the5^(th) position of the array of pointers to records where the sortedresult set is stored. Next, the value of the end position correspondingto this field value number of “2” is incremented by “+1,” so “5” isincreased to “6.”

FIG. 40 is an explanatory diagram illustrating the second step ofsorting according to Embodiment 13 of the present invention. In thesecond step, the second record in the raw data (in this example, the onewith record number=1) is processed. The field value of the “occupation”field of the record with record number “1” is “Programmer.” At thistime, the field value number “1” which specifies a field value of“Programmer” is stored in the array of pointers to the value controltable corresponding to record number “1.” Then, the value “3” of the endposition corresponding to the field value number of “1” is extracted andthis value of “3” is used as an address to set this record number “1” inthe 3^(rd) position of the array of pointers to records where the sortedresult set is stored. Next, the value of the end position correspondingto this field value number of “1” is incremented by “+1,” so “3” isincreased to “4.”Thereafter, the same operation as in the aforementionedfirst and second steps are repeated for the remaining record numbers of“2, 3, 4, 5, 6, 7, 8, 9” in the raw data. FIG. 41 shows the final stateof the sort thus obtained. As can be seen from the sorted result set inthis figure, the sort according to Embodiment 13 of the presentinvention results in the records being sorted in the order of the“occupation” field value numbers, being reordered into the order of “2,4, 6, 1, 7, 0, 3, 5, 8, 9” by record number.

In the aforementioned explanation of Embodiment 13, the case isenvisioned wherein the raw data contains all records in the originaltable-format data, namely the entire set. However, the sort according tothe present invention is also effective on only a portion of records,namely on a partial set. Here follows an explanation of sorting apartial set in reference to FIGS. 42 and 43.

FIG. 42 is a diagram illustrating the state of completion of theaforementioned sorting on a partial set. In the case of this example,the raw data given consists of a record with a record number of “0” inwhich the field value of the “occupation” field is “Teacher,” and arecord with a record number of “1” in which the field value is“Programmer.” When the sorting is applied to both of these records, sortresults as shown by the sorted result set in this figure are obtained.At this time, the result set is contained in the array of pointers torecords. Therefore, an area of the same size as the entire set isallocated to store the sort results from a partial set.

Thus, in the case of a sort on a partial set, it is preferable that theresult set be compressed to the same size as the partial set. FIG. 43shows the post-processing for this sorting on a partial set. Thispost-processing, namely the compression of the result set, comprisestaking the difference between the start position and end position foreach field value in the value control table, and extracting the countand storage position in the sort results corresponding to the fieldvalue in question, and then arranging the sort results based on theextracted count and storage position.

Next, we shall again add a general explanation of the aforementionedsorting of the entire set or partial set, using the flowchart in FIG.50. Here, for both cases of an entire set and a partial set, the rawdata is considered to have storage position numbers attached in orderstarting from the beginning. For example, in FIGS. 38 and 41, thestorage position numbers match the record numbers. However, in the casethat the order of arrangement of raw data does not follow the recordnumber, the storage location number differs from the record number. Noteit goes without saying that the sorting described in detail regardingthe aforementioned embodiment can be implemented by the CPU 100executing a program stored in a stipulated area.

In the sorting, first, the storage location number is initialized (Step5001). Next, the corresponding pointer within the array of pointers tothe value control table is accessed for a certain storage locationnumber (Step 5002), and then identify the value of the end positionwhere the field value number specified by the pointer is positioned(Step 5003).

Thereafter, the corresponding record number is stored at the positionwithin the array of pointers to records identified by the aforementionedend position (Step 5004). Then, in the value of the end positionidentified in Step 5003 is incremented (Step 5005). The processing inthe aforementioned Steps 5002 through 5005 is performed for all of theraw data (see Steps 5006 and 5007), and thereby, it is possible toobtain an array of pointers to records containing the stipulated recordnumbers. One can see that the example of FIG. 39 corresponds to Step5002 through Step 5005 in the case that the storage position number is“0,” and one can also see that FIG. 40 corresponds to Step 5002 throughStep 5005 in the case that the storage position number is “1.” Inaddition, in the case of sorting a partial set, after Steps 5001 through5007 are executed, it is sufficient to compress the result set by meansof the sort post-processing (see Step 5008).

The sort according to the aforementioned Embodiment 13 of the presentinvention is a so-called “ascending order” sort, namely a sort whereinthe sort results are arranged in order of increasing field value numbersof the sorted field values. However, the sort results may also bearranged in “descending order,” wherein they are arranged in order ofdecreasing field value numbers of the sorted field values. The“descending order” sort is implemented by modifying the start positionused in the case of an “ascending order” sort. In the case of thisexample, the starting positions for an “ascending order ” sort are asfollows:

Student 0 Programmer 3 Teacher 5 Other 6

In contrast, the start positions for a “descending order” sort become:

Other 0 Teacher 10 − 6 = 4 Programmer 10 − 5 = 5 Student 10 − 3 = 7

A sort performed according to Embodiment 13 of the present invention assuch has the following advantages.

First, a high-speed sort is achieved. For example, in an operatingenvironment using Pentium Pro® 200 MHz/Windows 95®, the novel sortaccording to the present invention achieved a sort time for 1 million of145 ms. In contrast, in the case of a conventional Quicksort, the timerequired to sort 1 million integers was 1530 ms.

Second, constant performance is obtained regardless of the type of fieldvalues stored as values. The present sort gives performance identical tothat for integers even when the data type of the field values is text orfloating point, for example. In contrast, in the case of theconventional Quicksort or other algorithms, the speed is highest whenthe type of data handled is integer and lowest when the type isvariable-length text.

Third, the sorting speed does not drop even if the data size increases.With this sort, the sorting speed is expressed by O(n) where n is thedata size. On the other hand, with Quicksort or other conventionalhigh-speed sorting methods, the sorting speed is O(n·log(n)), forexample.

Fourth, sorts on multiple fields can be divided into sorts on eachfield. For example, in Embodiment 13 of the present invention describedwith reference to FIGS. 38-41, among the raw data, records correspondingto a field value of “Student” are arranged in the order record number“2,” record number “4,” and record number “6.” And this order of recordnumbers (namely, record number “2,” record number “4,” and record number“6”) is preserved in the final sorted result set. This means that theorder of records in the sorted result reflects the order of recordsprior to sorting within the scope of satisfying the purpose of the sort.By taking advantage of this characteristic of the present sort, a sorton multiple fields can be achieved by performing sequential sorts oneach individual field. In contrast, with the conventional Quicksort, thestate prior to sorting is known not to be reflected in the order of thesort results.

SPECIFIC EXAMPLES

In the aforementioned various embodiments of the present invention, thevalue control table contains a value list of the field values. Forexample, in the example shown in FIG. 7, the field value column containsa list including the values “16,” “17,” “18,” . . . . In addition, thevalue control table includes category numbers set for each field valuenumber. Here follows an explanation of a specific example illustratinghow the combination of such a value list and category number for a fieldvalue can be used to determine immediately whether or not multiplevalues for a certain field match search conditions by means of severalcomparative judgments.

For example, consider the case wherein the following list is given asthe value list, sorted by the magnitude of the value:

0.1, 0.2, . . . ,

100.0, 100.1, 100.2, . . . ,

1000.0

and “value is greater than 100” is given as the search condition.

First, from the value list, find the smallest value that does notsatisfy the condition (in this example, 100). Then, set “0” as thecategory number for all the values in the value list before the smallestvalue, namely “100.” In addition, set “1” as the category number for allvalues in the value list after “100.” Thereby, if the smallest value canbe found, thereafter, the category number is set without performing anycomparison operations, and thus a field value or field value numberhaving a value satisfying the search conditions can be obtained.

By using the bisection method or other known methods in the prior art,the smallest value can be found by a small number of comparisonoperations. For example, if there are N variations in the values presentin the value list, then the number of comparison operations required tofind the aforementioned smallest value is roughly log₂(N).

In contrast, in the event that the value list is not sorted on themagnitude of the value, if there are similarly N variations in thevalues present in the value list, then N comparison operations arerequired to find the value that satisfies the search condition.

Since the value list is sorted and category numbers are set in the valuecontrol table in this manner, the determination of whether or not thestipulated search conditions are met can be speeded up.

In addition, as would be naturally understandable to a person skilled inthe art, the content of the aforementioned value list and the searchconditions are no more than a single specific example for explainingthis example, and according to the present invention, the determinationof whether or not the stipulated search conditions are met can bespeeded up for various value lists and for various combinations ofsearch conditions.

Next, we shall describe tests of searching for and tabulating 1 millionrecords of data. The platform used in the tests was an ordinary personalcomputer equipped with a Pentium Pro® 200 MHz processor and 128 MB ofmemory. FIG. 44 is a table showing the data used in the tests. The dataconsisted of one million numbers in the range from “000000” to “999999”in the form of table-format data divided into three fields consisting ofthe 0,000's unit, the 100's unit and the 1's unit. Field values in therange from “00” to “00” appear 10,000 times apiece for each field.

FIG. 45 is a list of the test results showing the time required tosearch for/tabulate 1 million records, measured depending on the resultset type. The result set type is one two types, namely theaforementioned bit flag type and array of pointers type. The times inthe test results are given in units of milliseconds (ms; {fraction(1/1000)} seconds).

The search performed in the aforementioned test is a search of an AND ofmultiple fields, by connecting the three fields of “×10,0000,” “×100”and “×1” with an AND condition. The search was a cascade of the fields“×10,000,” “×100” and “×1” in this order. The intermediate and finalresult sets from the search take the form of a bit flag or array ofpointers as described above. The measured times are given as the averageof five measurements.

In addition, the tabulation in these tests consists of counting thenumber of times the various values (00 through 99) of the “×100” and“×1” fields appear in the result sets obtained from the search tests.The size of the table for this cross-tabulation is (100×100=)10,000cells. The times are given in units of milliseconds and the average of 5measurements is given as the measured times.

The constitution of the searching and tabulating system for table-formatdata is in no way limited to the examples described in theaforementioned embodiments, but rather the various constituent elementsof the searching and tabulating system may be implemented in software(program), stored on a disk device or the like and, if necessary, thesearching and tabulating system can be installed on a computer toperform the searching and tabulating of table-format data. Moreover, theprogram thus implemented may be stored on a floppy disk or CD-ROM orother portable storage medium, and can be used in a general purposefashion in a situation in which such a system is used.

The present invention is in no way limited to the aforementionedembodiments but various modifications are possible within the scope ofthe invention recited in the patent claims, and it goes without sayingthat these are also included within the scope of the present invention.

For example, in the AND searches and OR searches illustrated in theaforementioned Embodiment 2, searches were executed on two fields, butthis is not a limitation, since it is clear that searches on three ormore fields can also be implemented.

In addition, in the aforementioned Embodiment 7 and Embodiment 8, atwo-dimensional array was generated in order to perform a tabulation ontwo fields, but this is not a limitation, as it is possible to generatea three-dimensional or higher-dimensional array in order to perform atabulation on three or more fields, and it goes without saying thatthese can be used to perform the aforementioned tabulation. Consideringthe tabulation of three fields, field value numbers q₁, q₂, q₃ in eachof the three information blocks arc extracted, and this is used toidentify one element P(q₁, q₂, q₃) in the three-dimensional array.

Moreover, regarding the aforementioned Embodiment 9 also, it goeswithout saying that it is possible to performed tabulation on three ormore fields in the same manner as in Embodiment 7 and Embodiment 8.

In addition, while the searching, tabulating and/or sorting areimplemented by reading a stipulated program into an ordinary computersystem and then executing the program in the aforementioned embodiments,the present invention is in no way limited to this, but rather it goeswithout saying that it may be constituted such that a board computerused exclusively for data processing is connected to a personal computeror other ordinary computer system, and this board computer can executethe aforementioned processing. Therefore, in this s pacification, theword means does not necessarily mean a physical means, but rather itincludes the case in which the functions of the various means areimplemented by software and the case in which some or all of thefunctions are implemented by hardware. Moreover, the functions of asingle means may be implemented by two or other physical means, or thefunctions of two or more means may be implemented by one physical means.According to the aforementioned description, by means of the presentinvention, it is possible to process large amounts of data expressed intable format without using the conventional data tables which requiredlong access times, so the speed of tabulating and searching. can begreatly increased.

FIELD OF THE INVENTION

The present invention is particularly suited for use in systems thathandle large amounts of data, for example, databases and datawarehouses. More specifically, it is suited to large-scale scientificand technical calculation, control systems for plants and power supplyand the like, methods of planning of delivery and resource distribution,and to order management and the management of clerical work such assecurities trading.

What is claimed is:
 1. When table-format data is represented as an arrayof records consisting of a plurality of fields containing field valuesfor each field, a method of extracting from said table-format data thefield value corresponding to a specific field and a specific record,said method being characterized in comprising the steps of: keeping in astorage device, for each individual field, a value control tablecontaining field values for that field corresponding to a field valuenumber that uniquely identifies said field value, which is a field valuenumber that is common to the various fields and has a stipulated orderfrom an initial value, and a field value number-specifying informationarray containing information that specifies said field value numbers inthe order of said records, acquiring from said field valuenumber-specifying information array the field value number correspondingto said specific record, and obtaining from the field values stored insaid value control table the field value corresponding to the fieldvalue number acquired as above.
 2. The method according to claim 1,characterized in that, in order to categorize the field valuescorresponding to said field value number, category numbers are stored insaid value control table corresponding to said field value number, andsaid category numbers are accessed at the time of obtaining the fieldvalue corresponding to said field value number.
 3. The method accordingto claim 1, characterized in that said information that specifies thefield value number is the field value number itself.
 4. The methodaccording to claim 1, characterized in that said information thatspecifies the field value number is a binary value wherein 1 bit isallocated to each field value number, thus setting whether or not it isset.
 5. A computer program product that is loadable into the memory of acomputer, and that implements a method according to claim
 1. 6. Whentable-format data is represented as an array of records consisting of aplurality of fields containing field values for each field, a method ofsearching through said table-format data for field values that matchspecific search conditions, said method being characterized incomprising the steps of: keeping in a storage device, for eachindividual field, a value control table containing field values for thatfield corresponding to a field value number that uniquely identifiessaid field value, which is a field value number that is common to thevarious fields and has a stipulated order from an initial value, and afield value number-specifying information array containing informationthat specifies said field value numbers in the order of said records,setting search conditions containing a specific field and the fieldvalue to be searched for in said field, examining the field valuenumbers within the corresponding field value number-specifyinginformation array in the order of said records, regarding the fieldrelated to said search conditions, determining whether or not the fieldvalues in said value control table specified by said field value numbermatch said search conditions thus set, and accumulating records thatmatch said search conditions as a result set.
 7. The method according toclaim 6, characterized in comprising the steps of: keeping in a storagedevice the result set of records that match said search conditions,regarding fields related to other search conditions, acquiring from saidfield value number-specifying information array regarding said otherfields the field value number corresponding records that match saidsearch conditions set in said result set, regarding said other fields,determining whether or not the field values identified by said extractedfield value numbers match said other search conditions, regarding saidother fields, if the field values identified by said extracted fieldvalue numbers match said other search conditions, extracting saidrecords corresponding to said field value numbers as records that matchsaid separate search conditions, and if necessary, extracting said fieldvalue numbers with respect to still other fields regarding still othersearch conditions, and repeating the determination of matching andextraction of records.
 8. The method according to claim 6, characterizedin comprising the steps of: keeping in a storage device the result setof records that match said search conditions, regarding fields relatedto other search conditions, using field values within the field valuesstored in other value control tables that match said search conditionsand record identification information-specifying informationcorresponding to related field values to extract from a recordidentification information array the records that match said othersearch conditions, and store the records that match the searchconditions in a specified other record set, if necessary, regardingstill other search conditions, using still other record identificationinformation-specifying information to extract records that match stillother search conditions, and repeating the storage of still other resultsets, and obtaining a final result set by eliminating duplicate recordsfrom the result sets thus obtained.
 9. The method according to claim 6,characterized in that said value control table comprises, for each ofsaid field value numbers, a start position that indicates the startingpoint of said exclusive area, and a count that indicates the number ofrecords that have identical field value numbers, and a stipulated recordidentification information array is identified by accessing said startposition and count.
 10. The method according to claim 6, characterizedin that the category numbers for categorizing the field valuescorresponding to said field value numbers in said value control tableare stored corresponding to said field value number, and said categorynumber is used to identify the field values that match the searchconditions.
 11. The method according to claim 6, characterized in thatsaid information that specifies the field value number is the fieldvalue number itself.
 12. The method according to claim 6, characterizedin that said information that specifies the field value number is abinary value wherein 1 bit is allocated to each field value number, thussetting whether or not it is set.
 13. A computer program product that isloadable into the memory of a computer, and that implements a methodaccording to claim
 6. 14. When table-format data is represented as anarray of records consisting of a plurality of fields containing fieldvalues for each field, a method of tabulating said table-format data,said method being characterized in comprising the steps of: if nrepresents an integer equal to 1 or greater, for each of n fields usedin tabulation, keeping in a storage device individual field informationconsisting of a value control table containing field values for thatfield corresponding to a field value number that uniquely identifiessaid field value, which is a field value number that is common to thevarious fields and has a stipulated order from an initial value, and afield value number-specifying information array containing informationthat specifies said field value numbers in the order of said records, ifi represents an integer in the range 1≦i≦n, for the i^(th) individualinformation field, the total number of said field value numbers isrepresented by N_(i), k_(i) represents an integer in the range0≦k_(i)≦N_(i)−1, M represents an integer equal to 1 or greater, and if mis an integer in the range 1≦m≦M, then initializing elements P_(m)(k₁,k₂, . . . , k_(i), . . . , k_(n)) of n-dimensional M data spaces havinga size of N₁×N₂× . . . ×N_(i)× . . . ×N_(n), for said n individualinformation fields, when j represents an integer in the range 0≦j≦(totalnumber of records)−1, extracting the respective field value numbersstored in the j^(th) position in each field value number-specifyinginformation array, and when the field value number extracted from thei^(th) individual information field is represented by q_(i), identifyingthe elements P_(m)(q₁, q₂, . . . , q_(i), . . . , q_(n)) of said dataspace, and processing said identified values of the elements P_(m)(q₁,q₂, . . . , q_(i), . . . , q_(n)).
 15. The method according to claim 14,characterized in that said information that specifies the field valuenumber is the field value number itself.
 16. The method according toclaim 14, characterized in that said information that specifies thefield value number is a binary value wherein 1 bit is allocated to eachfield value number, thus setting whether or not it is set.
 17. Acomputer program product that is loadable into the memory of a computer,and that implements a method according to claim
 14. 18. Whentable-format data is represented as an array of records consisting of aplurality of fields containing field values for each field, a method oftabulating said table-format data, said method being characterized incomprising the steps of: if n represents an integer equal to 1 orgreater, for each of n fields used in tabulation, keeping in a storagedevice individual field information consisting of a value control tablecontaining field values for that field and the category number of saidfield value corresponding to a field value number that uniquelyidentifies said field value, which is a field value number that iscommon to the various fields and has a stipulated order from an initialvalue, and a field value number-specifying information array containinginformation that specifies said field value numbers in the order of saidrecords, if i represents an integer in the range 1≦i≦n, for the i^(th)individual information field, the total number of either said fieldvalue numbers or said category numbers is represented by N_(i), k_(i)represents an integer in the range 0≦k_(i)≦N_(i)−1, M represents aninteger equal to 1 or greater, and if m is an integer in the range1<m<M, then initializing elements P_(m)(k₁, k₂, k_(i), . . . , k_(n)) ofn-dimensional M data spaces having a size of N₁×N₂× . . . ×N_(i)× . . .×N_(n), for said n individual information fields, when j represents aninteger in the range 0≦j≦(total number of records)−1, extracting therespective field value numbers stored in the j^(th) position in eachfield value number-specifying information array, and when the fieldvalue number extracted from the i^(th) individual information field orthe category number stored corresponding to said field value number inthe value control table of said i^(th) individual information field isrepresented by q_(i), identifying the elements P_(m)(q₁, q₂, . . . ,q_(i), . . . q_(n)) of said data space, and processing said identifiedvalues of the elements P_(m)(q₁, q₂, . . . , q_(i), . . . , q_(n)). 19.The method according to claim 18, characterized in that M=1 is true, andthe step of processing the value of said identified element P_(m)consists of adding 1 to the current value of said element P_(m).
 20. Themethod according to claim 18, characterized in that the step ofprocessing the value of said identified element P_(m) consists of: forat least one element P_(m) among the M elements P_(m), for separateindividual field information kept in a storage device, acquiring thefield value numbers stored in the j^(th) position in the field valuenumber-specifying information array, from among the field values storedin the value control table of said separate individual fieldinformation, acquiring the field value corresponding to said field valuenumber thus acquired, and updating the current value of said elementP_(m) and the value of said element P_(m) in combination with said fieldvalue thus obtained.
 21. The method according to claim 18, characterizedin that said information that specifies the field value number is thefield value number itself.
 22. The method according to claim 18,characterized in that said information that specifies the field valuenumber is a binary value wherein 1 bit is allocated to each field valuenumber, thus setting whether or not it is set.
 23. A computer programproduct that is loadable into the memory of a computer, and thatimplements a method according to claim
 18. 24. When table-format data isrepresented as an array of records consisting of a plurality of fieldscontaining field values for each field, an apparatus for searching forand tabulating said table-format data, said apparatus beingcharacterized in comprising: a storage device for keeping, for eachindividual field, a value control table containing field values for thatfield corresponding to a field value number that uniquely identifiessaid field value, which is a field value number that is common to thevarious fields and has a stipulated order from an initial value, and afield value number-specifying information array containing informationthat specifies said field value numbers in the order of said records,field value number acquisition means for acquiring from said field valuenumber-specifying information array kept on said storage device thefield value number corresponding to said specific record, and fieldvalue obtaining means for obtaining from the field values stored in saidvalue control table kept on said storage device the field valuecorresponding to the field value number acquired as above.
 25. Theapparatus according to claim 24, characterized in that said storagedevice keeps individual field information that has said value controltable, said field value number-specifying information array, and arecord identification information array storing in exclusive areas foreach of said field value number one or more pieces of recordidentification information related to identical field value numbers, andsaid value control table includes, for each of said field value numbers,record identification information-specifying information that indicatesthe area where said one or more pieces of record identificationinformation related to identical field value numbers in said recordidentification information array, and furthermore, has search means forusing said record identification information-specifying informationcorresponding to field value numbers related to field values within thefield values contained in said value control table that match stipulatedsearch conditions, to acquire record identification information fromsaid record identification information array that matches said searchconditions.
 26. When table-format data is represented as an array ofrecords consisting of a plurality of fields containing field values foreach field, a computer-readable storage medium upon which is recorded aprogram for searching for and tabulating said table-format data, saidstorage medium being recorded with a program characterized incomprising: a step of keeping in a storage device, for each individualfield, a value control table containing field values for that fieldcorresponding to a field value number that uniquely identifies saidfield value, which is a field value number that is common to the variousfields and has a stipulated order from an initial value, and a fieldvalue number-specifying information array containing information thatspecifies said field value numbers in the order of said records, a stepof acquiring from said field value number-specifying information arraykept on said storage device the field value number corresponding to saidspecific record, and a step of obtaining from the field values stored insaid value control table kept on said storage device the field valuecorresponding to the field value number acquired as above.
 27. Whentable-format data is represented as an array of records consisting of aplurality of fields containing field values for each field, acomputer-readable storage medium upon which is recorded a program forsearching through said table-format data for records that contain afield value that matches search conditions, said program beingcharacterized in comprising: a step of keeping in a storage device, foreach individual field, individual field information such that includes avalue control table containing field values for that field correspondingto a field value number that uniquely identifies said field value, whichis a field value number that is common to the various fields and has astipulated order from an initial value, a field value number-specifyinginformation array containing information that specifies said field valuenumbers in the order of said records, and a record identificationinformation array storing in exclusive areas for each of said fieldvalue numbers one or more pieces of record identification informationrelated to identical field value numbers, and said value control tableincludes, for each of said field value numbers, record identificationinformation-specifying information that indicates the area where saidone or more pieces of record identification information related toidentical field value numbers in said record identification informationarray, and a step of using said record identificationinformation-specifying information corresponding to field value numbersrelated to field values within the field values contained in said valuecontrol table that match said search conditions, to acquire recordidentification information from said record identification informationarray that matches said search conditions.
 28. A sorting method wherebyan array of record identification information specifying recordsconsisting of a plurality of fields containing field values for eachfield is rearranged on a specific field, said method being characterizedin comprising the steps of: keeping in a storage device, for eachindividual field, a value control table containing field values for thatfield corresponding to a field value number that uniquely identifiessaid field value, which is a field value number that is common to thevarious fields and has a stipulated order from an initial value, and afield value number-specifying information array containing informationthat specifies said field value numbers in the order of said records,the value control table further including record identificationinformation-specifying information that, corresponding to said fieldvalue number, indicates the area in said record identificationinformation-specifying information array where said one or more piecesof record identification information regarding identical field valuenumbers are stored, for each of said records, associating said recordidentification information with field value numbers corresponding to thefield values of said fields, for each of said field value numbers,defining the storage location after reordering said recordidentification information, sequentially extracting said recordidentification information from said array, determining said field valuenumber corresponding to said record identification information thusextracted, storing said record identification information thus extractedin said storage location according to the record identificationinformation-specifying information corresponding to the field valuenumber thus determined, and updating said storage location where saidrecord identification information is to be stored, in order to store thenext record identification information.
 29. The method according toclaim 28, characterized in that said record identificationinformation-specifying information comprises a start position thatindicates the starting point of the area of said storage location, andan end position that is initially equivalent to the start position andwhose value is incremented upon said update.
 30. A computer programproduct that is loadable into the memory of a computer, and thatimplements a method according to claim
 28. 31. A sorting apparatus thatrearranges an array of record identification information specifyingrecords consisting of a plurality of fields containing field values foreach field on a specific field, said apparatus being characterized incomprising: a storage device for keeping, for each individual field, avalue control table containing field values for that field correspondingto a field value number that uniquely identifies said field value, whichis a field value number that is common to the various fields and has astipulated order from an initial value, and a field valuenumber-specifying information array containing information thatspecifies said field value numbers in the order of said records, thevalue control table further including record identificationinformation-specifying information that, corresponding to said fieldvalue number, indicates the area in said record identificationinformation-specifying information array where said one or more piecesof record identification information regarding identical field valuenumbers are stored, for each of said records, means of associating saidrecord identification information with field value numbers correspondingto the field values of said fields, for each of said field valuenumbers, definition means for defining the storage location afterreordering said record identification information, field value numberdetermination means for sequentially extracting said recordidentification information from said array and determining said fieldvalue number corresponding to the record identification information thusextracted, record identification information storage means for storingsaid record identification information thus extracted in said storagelocation according to the record identification information-specifyinginformation corresponding to the field value number thus determined, andupdating means for updating said storage location where said recordidentification information is to be stored, in order to store the nextrecord identification information.
 32. A computer-readable storagemedium upon which is recorded a sorting program for rearranging on aspecific field an array of record identification information specifyingrecords consisting of a plurality of fields containing field values foreach field, said sorting program being characterized in comprising: astep of keeping in a storage device, for each individual field, a valuecontrol table containing field values for that field corresponding to afield value number that uniquely identifies said field value, which is afield value number that is common to the various fields and has astipulated order from an initial value, and a field valuenumber-specifying information array containing information thatspecifies said field value numbers in the order of said records, whereinthe value control table further includes record identificationinformation-specifying information that, corresponding to said fieldvalue number, indicates the area in said record identificationinformation-specifying information array where said one or more piecesof record identification information regarding identical field valuenumbers are stored, for each of said records, a step of associating saidrecord identification information with field value numbers correspondingto the field values of said fields, for each of said field valuenumbers, a step of defining the storage location after reordering saidrecord identification information, a step of sequentially extractingsaid record identification information from said array, a step ofdetermining said field value number corresponding to said recordidentification information thus extracted, a step of storing said recordidentification information thus extracted in said storage locationaccording to the record identification information-specifyinginformation corresponding to the field value number thus determined, anda step of updating said storage location where said recordidentification information is to be stored, in order to store the nextrecord identification information.